Combining Words and Speech Prosody for Automatic Topic Segmentation
نویسندگان
چکیده
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using standard evaluation metrics. Results show that the prosodic model alone outperforms the word-based segmentation method. Furthermore, we achieve an additional reduction in error by combining the prosodic and wordbased knowledge sources.
منابع مشابه
Prosody Modeling for Automatic Speech Recognition and Understanding
This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automati...
متن کاملCombining words and prosody for information extraction from speech
Information extraction from speech is a crucial step on the way from speech recognition to speech understanding. A preliminary step toward speech understanding is the detection of topic boundaries, sentence boundaries, and proper names in speech recognizer output. This is important since speech recognizer output lacks the usual textual cues to these entities (such as headers, paragraphs, senten...
متن کاملProsody-based automatic segmentation of speech into sentences and topics
A crucial step in processing speech audio data for informationextraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (informationgleaned from the timing and m...
متن کامل60 36 v 1 2 7 Ju n 20 00 Prosody - Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and...
متن کاملFully automatic segmentation for prosodic speech corpora
While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the process or manual post-processing. This is very time-consuming and slows down porting of speech systems to new languages. In the context of prosody corpora for text-to-speech (TTS) systems, we investigated methods for f...
متن کامل